AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.58)

Neural Information Processing SystemsOct-8-2025, 20:58:56 GMT

6cf7a37e761f55b642cf0939b4c64bb8-Supplemental-Conference.pdf

artificial intelligence, latent variable, machine learning, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Utah (0.04)
Europe > Netherlands (0.04)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

arXiv.org Machine LearningJun-17-2025

Efficient Network Automatic Relevance Determination

Zhang, Hongwei, Ye, Ziqi, Wang, Xinyuan, Guo, Xin, Xu, Zenglin, Cheng, Yuan, Hu, Zixin, Qi, Yuan

We propose Network Automatic Relevance Determination (NARD), an extension of ARD for linearly probabilistic models, to simultaneously model sparse relationships between inputs $X \in \mathbb R^{d \times N}$ and outputs $Y \in \mathbb R^{m \times N}$, while capturing the correlation structure among the $Y$. NARD employs a matrix normal prior which contains a sparsity-inducing parameter to identify and discard irrelevant features, thereby promoting sparsity in the model. Algorithmically, it iteratively updates both the precision matrix and the relationship between $Y$ and the refined inputs. To mitigate the computational inefficiencies of the $\mathcal O(m^3 + d^3)$ cost per iteration, we introduce Sequential NARD, which evaluates features sequentially, and a Surrogate Function Method, leveraging an efficient approximation of the marginal likelihood and simplifying the calculation of determinant and inverse of an intermediate matrix. Combining the Sequential update with the Surrogate Function method further reduces computational costs. The computational complexity per iteration for these three methods is reduced to $\mathcal O(m^3+p^3)$, $\mathcal O(m^3 + d^2)$, $\mathcal O(m^3+p^2)$, respectively, where $p \ll d$ is the final number of features in the model. Our methods demonstrate significant improvements in computational efficiency with comparable performance on both synthetic and real-world datasets.

artificial intelligence, machine learning, nard, (17 more...)

2506.12352

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Pennsylvania (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Dermatology (0.67)
Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (0.46)

Neural Information Processing SystemsOct-10-2024, 12:37:31 GMT

Scalable Bayesian GPFA with automatic relevance determination and discrete noise models

Latent variable models are ubiquitous in the exploratory analysis of neural population recordings, where they allow researchers to summarize the activity of large populations of neurons in lower dimensional'latent' spaces. Existing methods can generally be categorized into (i) Bayesian methods that facilitate flexible incorporation of prior knowledge and uncertainty estimation, but which typically do not scale to large datasets; and (ii) highly parameterized methods without explicit priors that scale better but often struggle in the low-data regime. Here, we bridge this gap by developing a fully Bayesian yet scalable version of Gaussian process factor analysis (bGPFA), which models neural data as arising from a set of inferred latent processes with a prior that encourages smoothness over time. Additionally, bGPFA uses automatic relevance determination to infer the dimensionality of neural activity directly from the training data during optimization. To enable the analysis of continuous recordings without trial structure, we introduce a novel variational inference strategy that scales near-linearly in time and also allows for non-Gaussian noise models appropriate for electrophysiological recordings.

automatic relevance determination, determination and discrete noise model, scalable bayesian gpfa, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.60)

Hinrich, Jesper Løve, Mørup, Morten

Probabilistic Block Term Decomposition for the Modelling of Higher-Order Arrays

arXiv.org Machine LearningOct-4-2023

Tensors or multi-way arrays naturally occur in practically all areas of science including psychology (i.e., human responses to questionnaire data according to scoring criteria of different objects), chemometrics (i.e., excitation and emission spectra across samples), biology (i.e., genetic expression of cell proles across time and experimental conditions), and knowledge representations (i.e., entity-entity relationships across predicates), see also [1] and references therein. To analyze these multi-way arrays accounting for their higher order structure tensor decompositions have become important tools to characterize and discover structure in these data, see [2, 1] for details. Tensor decompositions have historically focused on maximum likelihood estimation methods to obtain a point estimate to decompose the data, most predominately based on Gaussian likelihood (least squares estimation). Recently, there has been a rise in the development of Bayesian inference for tensor data, initially focusing on binary or count data, but now applied more broadly to various types of data, for an overview see [3, 4]. The benets of a Bayesian approach are that it characterizes the decomposition solution as a distribution, the so-called posterior distribution, which allows characterization of the uncertainty whereas priors acts as regularizers adding robustness and preventing issues of degeneracy. Additionally, it provides a principled way to incorporate a priori information. For a review on maximum likelihood based and Bayesian tensor decomposition, see [2] and [3], respectively. The two most common tensor decomposition methods are the Canonical Polyadic Decomposition/PARAFAC (CPD) and Tucker model. The CPD model represents the data through a sum of outer product rank-1 terms (i.e., separate multi-linear structures), whereas Tucker uses a multi-linear rank decomposition (i.e., with "connected" multi-linear structures).

artificial intelligence, bayesian inference, machine learning, (16 more...)

2310.02694

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Neural Information Processing SystemsApr-6-2023, 14:47:04 GMT

A New View of Automatic Relevance Determination

Automatic relevance determination (ARD), and the closely-related sparse Bayesian learning (SBL) framework, are effective tools for pruning large numbers of irrelevant features. However, popular update rules used for this process are either prohibitively slow in practice and/or heuristic in nature without proven convergence properties. This paper furnishes an alternative means of optimizing a general ARD cost function using an auxiliary function that can naturally be solved using a series of re-weighted L1 problems. The result is an efficient algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a stationary point unlike existing methods. The analysis also leads to additional insights into the behavior of previous ARD updates as well as the ARD cost function.

automatic relevance determination, cost function, new view, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJan-31-2023

Adaptive sparseness for correntropy-based robust regression via automatic relevance determination

Li, Yuanhao, Chen, Badong, Yamashita, Okito, Yoshimura, Natsue, Koike, Yasuharu

Sparseness and robustness are two important properties for many machine learning scenarios. In the present study, regarding the maximum correntropy criterion (MCC) based robust regression algorithm, we investigate to integrate the MCC method with the automatic relevance determination (ARD) technique in a Bayesian framework, so that MCC-based robust regression could be implemented with adaptive sparseness. To be specific, we use an inherent noise assumption from the MCC to derive an explicit likelihood function, and realize the maximum a posteriori (MAP) estimation with the ARD prior by variational Bayesian inference. Compared to the existing robust and sparse L1-regularized MCC regression, the proposed MCC-ARD regression can eradicate the troublesome tuning for the regularization hyper-parameter which controls the regularization strength. Further, MCC-ARD achieves superior prediction performance and feature selection capability than L1-regularized MCC, as demonstrated by a noisy and high-dimensional simulation study.

artificial intelligence, machine learning, regression, (15 more...)

doi: 10.1109/IJCNN54540.2023.10191293

2302.00082

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Deziderio, Nathalie, de Carvalho, Hugo Tremonte

Use of Variational Inference in Music Emotion Recognition

arXiv.org Machine LearningJul-9-2021

This work was developed aiming to employ Statistical techniques to the field of Music Emotion Recognition, a well-recognized area within the Signal Processing world, but hardly explored from the statistical point of view. Here, we opened several possibilities within the field, applying modern Bayesian Statistics techniques and developing efficient algorithms, focusing on the applicability of the results obtained. Although the motivation for this project was the development of a emotion-based music recommendation system, its main contribution is a highly adaptable multivariate model that can be useful interpreting any database where there is an interest in applying regularization in an efficient manner. Broadly speaking, we will explore what role a sound theoretical statistical analysis can play in the modeling of an algorithm that is able to understand a well-known database and what can be gained with this kind of approach.

mard, matrix, valence, (12 more...)

2106.14323

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > New York > New York County > New York City (0.04)
(13 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)